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(54) Memory allocation in a multithreaded environment 



(57) A method of allocating memory (16) in a multi- 
threaded (parallel) computing environment in which 
threads (30-33) running in parallel within a process are 
associated with one of a number of memory pools (24, 
38-40) of a system memory. The method includes the 
steps of establishing memory pools in the system mem- 



ory, mapping each thread to one of the memory pools; 
and, for each thread, dynamically allocatrig user mem- 
ory blocks from the associated memory pool. The meth- 
od allows any existing memory management malice 
(memory allocation) package to be converted to a mul- 
tithreaded version so that multithreaded processes are 
run with greater efficiency. 
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Description 

Background of the Invention 

5 The invention relates to memory allocation and more particularly to memory allocation in a multithreaded (parallel) 

environment. 

In allocating memory for a computer program, most older languages (e.g., FORTRAN, COBOL) require that the 
size of an array or data item be declared before the program is compiled. Moreover, the size of the array or data item 
could not be exceeded unless the program was changed and re-compiled. Today, however, most modern programming 

io languages, including C and C**, allow the user to request memory blocks from the system memory at run-time and 
release the blocks back to the system memory when the program no longer needs the blocks. For example, in these 
modem languages, data elements often have a data structure with a field containing a pointer to a next data element. 
A number of data elements may be allocated, at run-time, in a linked list or an array structure. 

The C programming language provides memory management capability with a set of library functions known as 

is "memory allocation" routines. The most basic memory allocation function is called mailoc which allocates a requested 
number of bytes and returns a pointer that is the starting address of the memory allocated. Another function known as 
free returns the memory previously allocated by mailoc so that it can be allocated again for use by other routines. 

In applications in which memory allocation occurs in parallel, tor example, in a multithreaded process, the mafloc 
and free functions must be 'code-locked*. Code-locking means that the library code of the process containing the 

20 thread is protected with a global lock. This prevents data corruption in the event that one thread is modifying a global 
structure when another thread is trying to read it. Code-locking allows only one thread to call any of the malice functions 
(e.g., malice, free, realkx) at any given time with other threads waiting until the thread is finished with its memory 
allocation. Thus, in a multithreaded process in which memory allocation functions are used extensively, the speed of 
the system is seriously compromised. 

2$ 

Summary of the Invention 

In general, in one aspect, the invention is a method of allocating memory in a multithreaded computing environment 
in which threads running in parallel within a process each have an associated memory pool in a system memory. The 
30 method includes the steps of establishing memory pools in the system memory, mapping each thread to one of the 
memory pools; and, for each thread, dynamically allocating user memory blocks from the associated memory pool. 
Each thread uses memory allocation routines (e.g.. mailoc) to manipulate its own memory pool, thereby providing 
greater efficiency of memory management. 

The invention converts an existing memory management mailoc package to a multithreaded version so that mul- 
3S tithreaded processes are run with greater efficiency Moreover, the invention is applicable to any application requiring 
memory management in parallel; in particular, those applications requiring significant parallel memory management. 
Furthermore, use of the invention is transparent from the application programmer's standpoint, since the user interface 
is the same as that of the standard C library memory management functions (i.e., malice, free, realloc). 

In a preferred embodiment, the method may further include the step of preventing simultaneous access to a memory 
40 pool by different threads. Having separate memory pools allows separate code-locking (e.g., mutex locking) to prevent 
simultaneous access to the memory pools by the different threads, thereby eliminating the possibility of data corruption. 
In existing standard memory allocation routines suitable for parallel execution, there is only a single code lock. Thus, 
only one thread can make a memory allocation routine call at any given time. All other threads running in the process 
must wait until the thread finishes with its memory allocation operation. In the invention, on the other hand, so long as 
^5 each thread is manipulating its own memory, memory allocation operations can be perlormed in parallel without any 
delay. The separate code-locking feature only becomes important when a thread attempts to access the memory pool 
of another thread. Such memory allocations of a memory pool not associated with that thread are fairly uncommon. 
Thus, the invention provides an improvement in the performance of the multithreaded process by significantly reducing 
time delays associated with memory allocation routine calls. 
so Preferred embodiments may include one or more of the following features. The step of dynamically allocating 

memory blocks includes designating the number of bytes in the block desired to be allocated. For example, calling the 
mailoc function will allocate any number of required bytes up to a maximum size of the memory pool. The step of 
establishing a memory pool for each thread may further include allocating a memory buffer of a preselected size (e. 
g., 64 Kbytes). In the event that the size of the memory pool has been exhausted, the size of the memory pool may 
ss be dynamically increased by allocating additional memory from the system memory in increments equal to the prese- 
lected size of the buffer memory. Moreover, the method may further include allowing one of the threads to transfer 
memory from the memory pool of another of the threads to its memory pool. 

Each memory pool may be maintained as a data structure of memory blocks, for example, an array of static var- 
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tables Identified by a thread index associated with one of the memory pools. The data structure includes a header 
which includes the size of the memory block and the memory pool index to which it is associated. The size of the block 
and the memory pool index may both be, for example, four bytes. 

The method may further include the step of allowing each thread to deallocate or free a memory block to the 
5 memory pool. The application may require that the memory block be freed from the thread which originally allocated 
the memory block. Other applications may allow the memory block to be freed from a thread which did not originally 
allocate the block. 

Coalescing or merging deallocated (or freed) memory blocks may be performed to unite smaller fragmented blocks. 
However, the method prevents coalescing of memory blocks from different pools. 

10 in the event that the size of a memory block needs to be enlarged in order to store more data elements, the size 

of an allocated block of memory allocated by a memory pool may be changed using a realloc routine. The method 
requires that realloc preserves the original memory pool. 

In general, in another aspect, the invention is a computer-readable medium storing a computer program for allo- 
cating memory in a multithreaded computing environment in which threads run in parallel within a process, each thread 

is having access to a system memory. The stored program includes computer-readable instructions: (1) which establish 
a plurality of memory pools in the system memory; (2) which map each thread to one of said plurality of memory pools; 
and (3) which, for each thread, dynamically allocate user memory blocks from the associated memory pool. A computer- 
readable medium includes any of a wide variety of memory media such as RAM or ROM memory, as well as, external 
computer-readable media, for example, a computer disk or CD ROM. A computer program may also be downloaded 

20 into a computer's temporary active storage (e.g., RAM, output buffers) over a network. For example, the above-de- 
scribed computer program may be downloaded from a Web site over the Internet into a computer's memory. Thus, the 
computer-readable medium of the invention is intended to include the computer's memory which stores the above- 
described computer program that is downloaded from a network- 
In another aspect of the invention, a system includes memory, a portion of which stores the computer program 

2$ described above, a processor for executing the computer-readable instructions of the stored computer program and 
a bu6 connecting the memory and processor. 

Other advantages and features will become apparent from the following description of the preferred embodiment 
and from the claim. 

30 Brief Description of the Drawings 

Fig. 1 is a block diagram of a multi-processing computer system which is suitable for use with the inventbn. 
Fig. 2 illustrates the relationship between a multithreaded application and a shared memory. 
Fig. 3 diagrammatically illustrates a data object in memory. 
35 Fig. 4 illustrates the relationship between a multithreaded application and a shared memory in which more threads 

than memory pools exist. 

Fig. 5 is an example of an application which calls memory management functions from threads running within a 
process 

40 Description of the Preferred Embodiments 

Referring to Fig. 1 , a simplistic representation of a mufti-processing network 1 0 includes individual processors 1 2a- 
12n of comparable capabilities interconnected to a system memory 14 through a system bus 16. All of the processors 
share access to the system memory as well as other I/O channels and peripheral devices (not shown). Each processor 

45 is used to execute one or more processes, for example, an application. 

Referring to Fig. 2, an application 20 which may be running on one or more of the processors 1 2a-l 2n (Fig. 1 ) is 
shown. Application 20 includes, here, a single thread 22 which has access to a section 24 of allocated memory within 
the system memory 14. This memory section is referred to as a memory pool. The application also includes a multi- 
threaded portion shown here having four threads 30-33. Although four threads are shown, the number of threads 

so running at any given time can change since new threads may be repeatedly created and old threads destroyed during 
the execution of the application. Each of threads 30-33, for example, may run on a corresponding one of processors 
1 2a- 1 2n. In other applications, all or multiple threads can run on a single processor. Thread 30 is considered to be the 
main thread which continues to use the memory section 24 allocated by the application as a single thread. However, 
additional threads 31-33 allocate their own memory pools 38-40 from the system memory 14. Thus, each thread is 

55 associated with a memory pool for use in executing its operations. During the execution of the application running on 
the threads, each thread may be repeatedly allocating, freeing and reallocating memory blocks from its associated 
memory pool using memory allocation functions (i.e. , ma Hoc, free, realloc) which are described in greater detail below. 
Moreover, while one thread is generally designated as the main thread, some of the remaining threads may be desig- 
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nated for particular purposes. 
Establishing Memory Pools 

s The number of memory pools (NUM_POOLS) is fixed. Although the malloc package programmer can change the 

number of pools, the package must be rebuilt after doing so. 

Establishing a memory pool for each thread includes allocating a memory buffer of a preselected size (e.g., 64 
Kbytes). In the event that the size of the memory pool has been exhausted, the size of the memory pool may be 
dynamically increased by allocating additional memory from the system memory. The additional memory may be alio- 

10 cated, for example, using a Unix system routine called sbrk( ) which, in this implementation, is called internally from 
within malloc and allocates the additional memory in increments equal to the preselected size of the buffer memory. 
Allocating additional memory requires the pool to be locked which prevents other memory functions to be performed 
at the same time. Thus, the size of the memory buffer is selected to be large relative to the average amount of memory 
requested by mallocf ) so that calls for increasing the size of the pool are infrequent. 

15 Each memory pool may be set up as a binary tree data structure with individual blocks of memory comprising the 

pool. The binary tree is ordered by size, although it may be ordered by address. Other data structures (e.g., linked 
lists) may alternatively be used; however, a binary tree structure may be preferred because of the increased speed it 
offers in searching. Moreover, a balancing or self-adjusting algorithm may be used to further improve the efficiency of 
the search. 

20 Referring to Fig 3, each block of memory 40 is identified by a data object 40 having a header 42 with a length 

consistent with the alignment requirements of the particular hardware architecture being used. For example, certain 
hardware configurations used by Sun Microsystems Inc. Mountain View, CA require the header to be eight bytes in 
length to provide an alignment boundary consistent with a SPARC architecture. The first four bytes of the header 
indicate the size of the block, with the remaining four bytes indicating a pool number 

2$ 

Memory Management Functions 

Each thread 20-23 allocates memory for its memory pool using a set of memory allocation routines similar to those 
from a standard C library. The basic function for allocating memory is called malloc and has the following syntax; 

30 

void • malloc (size) 

where size indicates the number of bytes requested. 
3S Another memory allocation routine is free which releases an allocated storage block to the pool of free memory 

and has the following syntax: 

void * free (old) 

40 

where old is the pointer to the block of memory being released. 
Still another memory allocation routine is realloc which adjusts the size of the block of memory allocated by malloc. 
R&alloc has the following syntax: 

45 

void * realloc (old. size) 

where: 

so old is the pointer to the block of memory whose size is being altered; and 

size is the new size of the block. 

Converting an Existing Malloc Packaae to a Multithreaded Malloc Package 

55 in order to convert an existing memory management package which uses a single lock to a parallel memory man- 

agement package, all static variables used in the above described memory management functions are convened into 
static arrays. For example, the binary tree structures associated with the memory pools are stored as a static array 
Each element of the static array is identified by its thread index and is associated with a given memory pool. There is 
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a separate static array element within each array for each pool. Thus, searching through the particular data structure 
(e.g., binary tree) for each thread can be performed in parallel. 

Each thread, therefore, can repeatedly execute any of the above routines to manage memory allocation of their 
associated memory pools. For example, referring again to Fig. 2. main thread 30 may execute a procedure in which 
5 memory blocks within memory pool 24 may be allocated, freed, and allocated again numerous times. Simultaneously, 
threads 31-33 may be executing procedures in which memory is being allocated and freed from and to their respective 
memory pools 38-40. 

Mapping Threads to Memory Pools 

Whenever a memory allocation function is called, a thread-identifying routine within each one of these functions 
is used to identify the thread making the memory allocation request. The thread-identifying function returns the thread 
index of the thread making the request, Forexample. the Solaris Operating System (OS), a product of Sun Microsystems 
Inc., uses in one implementation a function called thr_self ( ). 

Another algorithm is then used to map each thread index to a memory pool number For example, the described 
embodiment uses the following macro known as GET.THREADJNDEX which receives the thread index and returns 
an associated pool number: 

20 # define GET.THREADJNDEX(self) \ 

((self) == 1 ? 0 : 1 + ((serf)-4 % (NUM_POOLS-1) 

where: 

2S 

self is the thread index; and 

NUN_POOLS is the number of memory pools. 

As mentioned above, one thread is generally designated as the main thread with remaining threads designated for 

30 other purposes. For example, the SOLARIS OS uses a thread numbering system which reserves the first thread as a 
main thread, the second and third threads as system threads and subsequent threads as user threads. With the above 
macro, the memory pools are numbered 0 to NUM_POOLS-1. The first portion of the above macro (self == 1 ? 0) 
ensures that the main thread is always associated with the first pool number. Thus, if self is equal to 1 (i.e., it is the 
main thread), then the pool number is 0. Otherwise, as shown in the remaining portion of the macro after the ' V the 

3S remainder of the ratio of the thread index minus the constant four to the NUM_POOLS-1 is then added to the number 
1 to arrive at the pooi number. For example, if there are only four memory pools (i.e., NUN_POOLS = 4) and the thread 
index is 4, the associated pool number returned by the macro is 1. Thread indices of 5 and 6 would have associated 
memory pools numbered 2 and 3, respectively. 

In applications in which the number of threads existing at any given time exceeds the number of established pools, 

40 the additional threads share memory pools with another thread associated with that pool. Referring to Fig. 4, for ex- 
ample, an application is shown in which a new fifth thread 34 has been created. Because only four memory pools were 
established, the above mentioned macro is used to map thread 34 to first memory pool 24 originally associated with 
only thread 30. In this situation, the mutex lock associated with memory pool 24 prevents access by either thread 24 
or 34, if the other is using the pool. In the example of the preceding paragraph, macro GET.THREADJNDEX would 

45 map threads having thread indices of 4 and 7 to memory pool #1 . 

Code-Locking Memory Pools 

Each memory pool 24 and 38-10 is protected by its own mutual exclusion (mutex) lock. Like the data structures 
so associated with each memory pool, mutex locks are stored in a static array. Each mutex lock causes no delay in a 
thread that is allocating, deallocating or reallocating one or more memory blocks from its own memory pool. However, 
when a thread not associated with a particular memory pool attempts to access a memory block already allocated by 
the thread associated with that pool, the mutex lock prevents the non-associated thread from deallocating or reallocating 
a memory block from that pool. Thus, the lock protects the memory blocks from being updated or used by more than 
55 one thread at a time, thereby preventing the corruption of data in the memory block. Such attempts to allocate, deal- 
locate or reallocate memory blocks from a memory pool not associated with a thread are relatively infrequent. This 
feature provides a substantial improvement in the speed performance of the system over conventional schemes in 
which a single mutex lock is used for all memory management routines. Using a single mutex lock can significantly 
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degrade the performance of a multithreaded application. With this approach, once a thread makes a memory manage- 
ment function call (i.e., malice, free, or realkx) all other threads must wait until the thread has finished performing its 
memory management function. By providing separate mutex locks for each memory pool, each thread can, in parallel, 
allocate and free its own memory within its own memory pool while preventing access from non-associated threads. 
5 As memory blocks are repeatedly allocated, freed and reallocated by a thread, the memory pool may become 

fragmented into smaller and smaller blocks. Coalescing or merging of freed memory blocks which are contiguous is 
periodically performed to form larger memory blocks which can be reallocated by the thread. However before a memory 
block can be coalesced with an adjacent memory block, the described embodiment first determines whether the blocks 
are form the same pool. If not, the blocks are not coalesced, thus avoiding the possibility of data corruption. 

10 

Merge Malice Pools 

The extent to which the individual threads use memory management may vary significantly For example, referring 
again to Fig. 2, threads 31-33 may complete their tasks prior to the completion of the tasks performed by main thread 

'5 30. In such situations, the main thread may call an optional interface function which transfers the memory allocated 
by threads 31 -33 to the main thread 30. In other words, the function may be called by the main thread at the end of 
the multithreaded portion to consolidate to the main thread the memory previously allocated by the other threads. The 
routine used in this embodiment has the following prototype: 
void merge_malloc j>ools (void); 

20 The use of this function may not be needed in applications in which the multiple threads perform significant memory 
management throughout the application. 

Referring to Fig. 5, a simplistic representation of an application is shown running within main thread 30 and user 
thread 31 . It is assumed here that memory pools 24 and 38 (Fig. 2) which are associated with threads 30 and 31 , 
respectively have already been established. With respect to main thread 30, a first malloc routine call 50 is made 

2$ requesting a block of memory having SIZE#1 bytes. Later in the application, a first free routine call 52 is made to return 
a block of memory identified by pointer OLD. At this time, coalescing is generally performed to combine the returned 
block of memory with an adjacent block, so long as they are both from the same memory pool. Still later in the thread, 
a second malloc routine call 54 is made requesting a block of memory having SIZE #2 bytes. A rea/toc call 56 requesting 
that a block of memory identified by pointer OLD be resized to SIZE#3 bytes follows. Thread 31 is shown executing 

30 procedures concurrently with thread 30. For example, a first malloc routine call 60 is made followed sometime later by 
a first free routine call 62. Finally, in this example, after completion of the multithreaded portion of the application, a 
merge_ma!ioc _jxx>ts routine 64 is called to consolidate memory blocks allocated by thread 31 to the main thread 30. 

Attached as an Appendix is source code software for one implementation of a method of converting an existing 
malloc package to a multithreaded version of a malloc package. The source code represents a version of the program 

35 based on the set of memory allocation routines described in The C programming language , B.W. Kernighan and D.M. 
Richie. Prentice Hall (1988). 

Other embodiments are within the following claims. 
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/* A multithreaded (KT-hoO version of malloc and friends. 
Based on che ma Hoc package from 

che Keraighan 6 Ritchie ANSI C book (page 185), with modifications- 
By Greg Nekhimovsky 
Sun Microsystems, inc. 
Market Development Engineering 
January 1996 

Changes from the original Kfcft version: 

• All melloc routines are made MT-safg. 

• reallocO is added. 

• A separate free pool is created for each thread up to HOW_P0OLS. 
NOM POOLS is currently set to 4 but can be adjusted. 

• The~pool number is stored in the same header. 

• Increased header size to 1* bytes to make room for pool number. 

• each pool is protected by its own mutex lock. 

• freed returns the freed block to the pool the block was malloe'ed from. 

• reallocO modifies the block in the original, pool. 

• Coalescing (merging) is only done within the same pool . 

• A new routine is added for external interface: aerga^malloe_p©ola () . 
If called by the application from the main thread after the threaded 
section is over, it transfers all memory blocks from the additional pools 
back to the main pool. This call will eliminate a -memory leak*, in a 
sense that the main thread can reuse memory used by other threads. 

• To reduce additional fragmentation, the default block size for abrkO 
is increased from 8K to 64K. 

When multiple treads allocate and deallocate their own memory, they don't 
wait on their own locks. When one thread tries to free or reallocate memory 
allocated by another thread, the lock protects the free list from being 
updated or used by more than one thread at a time. Thai lock is also used when 
there axe more than NUM POOLS threads active at thrn same time. 



I include <stdio.h> 
•include < thread. h> 
# include « synch. h> 

/• number of pools for different threads ♦/ 
tdef ine OTH_POOLS 4 

/♦ minimum number of 16-byte units to request from syetem • C4K in this case */ 
Udefine KALLOC 4096 

*define MAGIC 0x5555 /* to check integrity of pointers to free, realloc */ 
fifrfef JRXEETBLMT /♦ will not compile without _R£Omuu?T defined */ 
static mutex t ^pool locks (M0M_POOLS] ; ~ 
tendif /♦ Jl£arrMllT~*7 

*def ine GET JIHR£AO m Z23SSX (self) ((self) 1) ? 0 : l ♦ { (self ) -4) % (NUM^POOLS - 1 ) 

♦ifndef OTU, 

* define NULL CO) 

*endif 

typedef double Align; 

/• inereaded the header size to 16 bytes - as */ 
union header { 



struct { 
union header ~ptr 
unsigned size; 
unsigaed pool; 
unsigned magic ; 
) «; 

Align X12J ; 



/* next block if on the free list •/ 

/♦ size of this block */ 

/• pool number •/ 

/• for checking - using 4 extra bytes »/ 



/• need 16 bytes because of the pool number */ 
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typedef union header Header; 

static Header baeefNUM^POOLS) ; /• etnpcy lists to get started ♦/ 

/• starts of free lists */ 

static Header •freepCNUW.POOLSj - {NULL, NULL, NULL, NULL} ; 

statie void *malloc^loeked (unsigned nbytes, unsigned thready index) ; 
stacie void free_locked(void *ap, unsigned thread_index) ;• 
static Header *raorecore (unsigned nunits, unsigned thxead_ index) ; 

void *malloc (unsigned nbytes) 

{ 

void *ret; 

unsigned self, thread_index; 
self - thr_aelf(); 

thrtad index - GET THREAD INDEX (self); 

\ 

tnutexJ.ocX ( 4 j>ool_locxs { tfcread_index] ) ; 
ret ■"malloc^locxed (nbytes, thread_index) ; 
mutax^unlocxTfc^pool^locxs (thready index) ) ; 

return rec? 

) 

static void ♦raalloc^locJced (unsigned nbytes, unsigned thread^index) 

Header *p, ♦prevp; 
unsigned nunits; 

nunits - (nbytes+sizeof (Header) -1) /eizeof (Header) ♦ 1; 

if ((prevp • f reap Ctbraad_iadexl ) HULL) { /* no free list yet 
base (tbread_index] .a.ptr - f reap (thr*ad_iadex] • 

prevp • abase [thread_indexj ; 
base (thread_indexl .a. size • 0; ~ 
base ( thread_index) . a . pool - thread^index; 

^ base (thread~index) .s. magic • maoicT 

for (p - prevp- >e.ptr; ; prevp » p, p - p->s.pcr) { 

if (p-*«.*ixe >- nunits) { /* big enougb */ 

if fp->s.size nunits) /* exactly */ 

prevp. > a.ptr • p->s.ptr; 

else { 

p«>s.sise nunits; 
p p*>s.size; 
p->s.sixe » nunits ; 
p->s.pool m thread index; 
^ p->s. magic - MAfllcT 

f reep (thread^ index] ■ prevp; 
return (void"*) <p*l) ; 

if (p — f reap {thread^index) ) /♦ wrapped around free list 
if ( (p • morecore (nunits , thread^index) ) NULL J 
return NULL; /* none left */ 



stacic Header 'morecore (unsigned nu, unsigned thread^index) 

char *cp, *sbrx(int); 
Header •up; 

if (nu < NALLOC) 
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50 
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nu • NAIXOC; 

cp - sbrk(nu * siraof (Header) ) ; /♦ abrkO assumed locked - SH 
i£ (cp — (char •) /• no space at all */ 

return NOLL; 
5 up ■ (Header *> cp; 

up->a.size ■ nu; 
up -> a. pool - thread_index; 
up*>s. magic - MAGIC; 

free locked ( (void *Mup*l>, thread_indax> ; 
return freep Cthread_index] ; 

10 ) 

void free (void *ap) 

( 

Header *bp; 

unsigned thr ead — index ; 

J5 inc i; " 

if (ap HULL) 
return; 

/# free thm block of the thread which allocated it •/ 
bp - (Header *)ap - l; /* point to block header */ 
20 if (bp- >a. magic !- MAGIC) { 

printf ("bogus pointer %x passed to freaOSn", ap) ; 
abort 0 ; 

thxead_indax - bp- >s. pool; 

25 mutex_lock ( fi_pool_locka [threed_indax] ) ; 

f r allocked (ap, thready index) ; 
^ nutex^ualock ( 6 jpool_lockj (thraad_index) ) ; 

static void free locked (void *ap, unsigned thread index) 

30 { 

Header *bp, *p; 
inc i; 

bp ■ (Header Map - 1; /* point to block header */ 

for (p • freep(thread_index) ; I (bp > p 64 bp < p->s.ptr>; p - p->s.ptr 
if (p >- p->s.ptr 66 (bp > p | | bp < p->s.ptr)) 

brtak; /♦ freed block at start or end of arena •/ 

if (bp * bp->s.site p->s.ptr 66 

bp- >s. pool mm p->s.pool) { /• join to upper nbr ♦/ 
bp- >s. size p->s.ptr->s.aize; 
bp->s.ptr - p->s.ptr->s.ptr; 

} alse 

bp->a.?tr - p->s.ptr; 
if (p ♦ p->s.size «- bp 66 

p->s.pool bp- >s. pool) { /* join to lower nbr •/ 
p->s.site bp->s.size; 
p->s.ptr - bp->s.ptr; 

} else 

p->s.ptr - bp; 
freep [thread^index] - p; 



) 



void ♦realloc (void *old, unsigned nbytes) 

/* Added by GSt, sines realloc () is not given in the KUL book ♦/ 

void *new; 
55 Header •bp; 

unsigned self, thread_index , ncopy; 
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if (Old NULL) 

return malloc (nbytes) ; 

self • thr self 0 ; 

thread_index - G£T_THR£AD_INDEX (self) ; 

bp - (Header +)old - 1; /* point to block header •/ 
if (bp- >s. magic !- MAGIC) { 

printf ("bogus pointer %x passed to realloc\n", old) ; 
10 abort 0; 

} 

if (bp- >• .pool ! - thread_index) 
threed_index - bp->a7pool; 
rnutex_locx7*_pool_locks lthread_index] ) ; » 

is /♦ for simplicity, 

always allocate a new block, copy and free the old one */ 
if ((new • malloc_locked (nbytes, thread_index) ) NULL) { 
tmatex_unlock (fc_pool_locks tthreed_indax) ) - 
return NULL; ~ " \ 

} 



20 



2$ 



30 



} 



if(nbytes > 0) { 

ncopy • sizeof (Header) * (bp- >s. size - 1) ; 
if (ncopy > nbytes) 

aeopy - nbytes; 
m e aa c py (new < old, ncopy) ; 

f ree_locked (old, thread_indax) ; 
mutex_unlock(* ^pool^locJcs [thxead_index] ) ; 
return new; ~ 



/• New externally called function merging all additional pools into 
the main thread's pool. Should only be called from the main thread. 
Assumes that only the main thread is active. 

*/ 

void merge_malloc_pools (void) 

35 { 

int i; 

Header *p # *prevp; 

/* skip the main thread's pool (0) •/ 
for (i-1; i<NUM_POOLS; i<M>) { 
40 prevp ."freeptij ; 

if (prevp im NULL) { 

for (p - prevp->s.ptr; ; prevp « p, p « p->s.ptr) { 
if(p->s.size > 0) { 

p->S. pool • 0; 

4s /♦ no need to lock - main thread only •/ 

^ frea_loeked( (void ♦) (p+i) , 0) ; 

if (p freep(ij) /* end of list ♦/ 
break; 

} 

$0 freepCi] . NULL; 



55 
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Claims 

1. A method of allocating memory in a multithreaded computing environment in which a plurality of threads run in 
parallel within a process, each thread having access to a system memory, the method comprising: 

5 

establishing a plurality of memory pools in the system memory; 
mapping each thread to one of said plurality of memory pools; and 

for each thread, dynamically allocating user memory blocks from the associated memory pool. 

io 2. The method of claim 1 wherein the step of dynamically allocating memory blocks includes designating the number 
of bytes in the block desired to be allocated. 

3. The method of claim 1 further comprising the step of preventing simultaneous access to a memory pool by different 
threads. 

rs 

4. The method of claim 1 further comprising the step of establishing a memory pool for each thread comprises allo- 
cating a memory buffer of a preselected size. 

5. The method of claim 4 further comprising the step of dynamically increasing the size of the memory pool by allo- 
20 eating additional memory from the system memory in increments equal to the preselected size of the buffer memory. 

6. The method of claim 4 wherein the preselected size of the buffer is 64 Kbytes. 

7. The method of claim 1 further comprising the step of one of the threads transferring memory from the memory 
& pool of another of the threads to its memory pool. 

8. The method of claim 1 wherein each memory pool is defined by an array of static variables identified by a thread 
index associated with a memory pool. 

30 9. The method of claim 8 wherein each memory pool is maintained as a data structure of memory blocks. 

10. The method of claim 9 wherein each memory block comprises a header including the size of the memory block 
and the memory pool index to which it is associated. 

35 11. The method of claim 10 wherein the size of the block and the memory pool index are each four bytes. 

12. The method of claim 1 further comprising the step of each thread deallocating a memory block to the memory pool. 

13. The method of claim 12 wherein the thread originally allocating the memory block deallocates it to its associated 
40 memory pool. 

14. The method of claim 12 further comprising the step of coalescing deallocated memory blocks and preventing 
coalescing of memory blocks from different pools. 

*5 1$. The method of claim 1 further comprising the step of changing the size of an allocated block of memory allocated 
by a memory pool. 

16. A computer-readable medium storing a computer program which is executable on a computer including a memory, 
the computer program for allocating memory in a multithreaded computing environment in which a plurality of 

so threads run in parallel within a process, each thread having access to a system memory, the stored program 

comprising: 

computer-readable instructions which establish a plurality of memory pools in the system memory; 
computer-readable instructions which map each thread to one of said plurality of memory pools; and 
55 computer-readable instructions which, for each thread, dynamically allocate user memory blocks from the 

associated memory pool. 

17. The computer-readable medium of claim 16 wherein the stored program further comprises computer instructions 
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which prevent simultaneous access to a memory pool by different threads. 

18. The computer-readable medium of claim 16 wherein the stored program further comprises computer instructions 
which causes one of the threads to transfer memory from the memory pool of another of the threads to its memory 
pool. 

r 

19. The computer-readable medium of claim 16 wherein each memory pool is defined by an array of static variables 
identified by a thread index associated with a memory pool. 

20. The computer-readable medium of claim 16 wherein the stored program further comprises computer instructions 
which coalesces deallocated memory blocks and prevents coalescing of memory blocks from different pools. 

21. A system comprising: 

memory, a portion of said memory storing a computer program for allocating memory in a multithreaded com- 
puting environment in which a plurality of threads run in parallel within a process, each thread having access 
to the memory, the stored program comprising: 

computer-readable instructions which establish a plurality of memory pools in the memory; 
computer- readable instructions which map each thread to one of said plurality of memory pools; and 
computer-readable instructions which, for each thread, dynamically allocate user memory blocks from the 
associated memory pool; 

a processor to execute said computer-readable instructions; and 
a bus connecting the memory to the processor. 
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